604157 


1 


o 


/ 

I 

I  c;  tiieohy 

!  AJ;D  S0^E  DECISICK-llAiaNG  EXPERIMENTS 


Merrill  K.  Flood 

P-346 
17  Novembei'  195? 


Approved  for  OTS  rese^sa 
/•' 

[MPY— 7'  OF  Z^Z 

HARD  COi'Y  $ .  ; 

MICROrlCHE  $.  6  .  :> 


D*DC 


\TPn::irpm  nn\ 
AUG  1 9  1964 

rrsiE^iLU  u  El 

DDOIRA  c 


-7^ 


Run  D 


t  700  MAIN  ST. 


tANTA  MONICA 


TABLE  OP  CONTENTS 


1.  .Introduction.  . . 1 

2.  a«MeralltleB . 2 

3.  The  problem . . . .  .  5 

Tne  ga.T.e -lear.'.lng  model . 5 

5.  The  fusion  ^une-Learnlng  model . 6 

6.  A  special  fuelan  model . 8 

7.  A  rat  experiment.  . . 9 

6.  Human  eubjecte . 15 

9.  Against  the  atat—rat.  . . 24 

^  10.  A  modified  fusion  model  .  . . 26 

11.  Summary . 28 

APPENDIX  A  (An  Asymptotic  Case)  .  31 

APPENDIX  B  (Morra) . 35 

1.  Game  of  ”Morra" . .  •  •  35 

2.  Solutions . 36 

BIBLIOGRAPHY . .38 


TABLES 


TABLE  1  (Stat^t  Strategies) . 13,  14 

TABLE  2  (3x3  Symmetric  Gaines) . 17 

TABLE  3  (Static  Morra) . 20 

TABLE  4  (Statlc-lO  OAme) . 22 

TABLE  5  (Statlc-9  Game) . 23 


ON  GAMR-LEARNINO  THEORY 
AND  SOME  DECISION-WAKING  EXPERIMENTS 


Merrill  M.  Flood 
The  RAND  Corporation 

W  paper  reports  on  games  In  which  a  player 
learns to  Improve  his  strategy  during  the  course  of  a 
sequence  of  plays.  The  fusion  model  developed  by  Bush 
and  Mosteiier  to  explain  observed  behivlor  of  rats  In 
experimental  learning  situations  was  used  as  the  basis 
for  both  theoretical  and  experimental  Investigation  of 
the  efficiency  of  this  type  of  stochastic  process  In 
learning  to  play  games.  The  experiments  reported  heg»e 
were  with  human  subjects.  Their  game— learning _perfor~ 
mance  was  compared  with  that  of  the* "stat— rat^,  repre¬ 
sented  by  the  fusion  mode)  with  numerical  values  of  the 
parameters  estimated  to  fit  experimental  data  for  rats. 

The  theoretical  models  accept  basic  assumptions  of 
von  Neumann-Morgenstern  game  theory  and  Bush-Mosteller 
learning  theory.  The  theoretical  and  experimental 
results  are  directly  relevant  for  any  si’ uation  in  which 

a  sequence  of  decisions  is  made.  L 

-  -  .  **...■  ^ 

. .  '  n  - - 

1-  * 

The  theory  of  games  [l”’  *  provides  a  general  mathematical 
model  that  may  sometimes  be  used  to  approximate  a  real  situation. 
Usually,  In  real  cases,  the  situation  Is  much  too  complicated 
to  permit  its  formulation  even  conceptually  as  a  formal  game. 


And  in  the  few  cases  that  can  be  so  formulated,  it  is  aimosc 


always  impractical  to  attempt  gathering  the  necessary  data  or  to 
do  the  elaborate  calculations  required  for  a  solution. 

The  non-constant-Bum  case,  even  with  two  players,  remains 
unsolved  In  the  sense  of  von  Neumann-Worgenstern.  There  arc 
theoretical  proposals  that  dispose  reasonably  well  of  the  two- 
person  case,  and  of  many  other  broad  special  cases;  I  have  dis¬ 
cussed  some  of  these  In  another  paper  [2] . 


*  Bracketed  numbers  refer  to  the  bibliography  at  the  end  of  the 
paper. 

I  am  Indebted  to  Dr.  D.  R.  Fulkerson  for  a  eareful  reading  of 
the  manuscript,  and  for  many  helpful  eoamantf  during  the  course 
of  the  work. 


_2- 


"in  this  papT  I  lnv*8tl3:at€  gjuaw-like  situations  in 

i^ioh  the  players  are  limited  biologically  In  their  choices  of 
moves.  These  limitations  are  reflected  In  the  method  of  play  of 
the  formal  game  and  stem  from  the  notion  that  animal  organisms 
seem  bO  learn  by  some  sort  of  conditioning  process  that  alters 
the  probability  that  some  one  of  several  mutually  exclusive 
alternatives  will  be  selected  In  each  new  instance. 

This  approach  was  suggested  to  me  by  the  work  of  R.  P.  Hales 
ana  A.  S.  Householder  Cj]  on  the  group  Interaction  process,  and 
Is  closely  connected  with  the  work  of  R.  R.  Bush  and  C.  F.  Mos teller 
[4j  on  mathematical  models  for  learning.  I  have  profited  from 
discussions  with  all  four  of  these  men.  There  Is  also  an  inter¬ 
esting  philosophical  discussion  of  stochastic  learning  models  in 

a  recent  paper  by  D.  M.  MacKay  j ,  and  a  stimulating  essay  by 

latter 

E.  G.  Boring  [6]  on  "robotology" ;  both  of  these/ papers  seem  to  me 
to  support  the  methodological  viewpoint  that  1  have  adopted. 

2 .  Generalities 

The  approach  used  In  this  paper  is  applicable  to  situations 
Involving  more  tt*an  two  organisms,  but  I  shall  concentrate  some¬ 
what  on  the  two-p]ayer  case.  A  player  could  In  fact  be  a  group 
of  people,  or  a  component  of  personality  within  one  Individual, 
but  I  shall  only  touch  upon  such  interpretations  occasionally. 

Since  my  main  object  Is  to  treat  some  one  case  of  real  behavior, 

I  shall  usually  be  content  with  a  discussion  In  terms  of  a  special 
real-life  situation,  leaving  broader  Interpretations  to  the  reader. 

The  connection  with  game  theory  Is  the  correspondence 
between  the  notion  of  choice  of  a  strategy  for  a  game  In  normal 


J 


'pmm  miA  t3m  ndtioo  of  of  $mtm  of  ootSMi' 

eottvlty.  *41  ^ry  ftfntawntail 


tmd  «*  t  th€ 


simplest  one,  is  the  problem  of  choosing  whether  or  not  to  act 
in  a  situation  r*  there  appears  to  be  only  one  choice;  acting 
or  not  acting,  l^or  example,  In  experiments  like  those  of  D.  F. 
Skinner  [7]  with  rats,  the  choice  at  some  moment  Is  w.iether  or 
not  to  press  a  bar.  For  a  human  example,  the  choice  might  be 
whether  or  uot  to  accept  a  particular  offer  for  a  new  position. 

In  these  examples,  and  In  most  real— life  situations,  the  organism 
somehow  reduces  Its  range  of  alternatives  to  a  relatively  small 
number  from  which  It  feels  It  must  choose;^  It  Is  this  recognized 
field  of  choices;  whether  they  are  considered  to  be  conscious 
or  unconscloas  alternatives,  that  corresponds  to  tne  set  of  stra¬ 
tegies  listed  In  the  normal  form  of  the  formal  game. 

In  what  follows,  I  shall  try  to  use  such  terms  as  game, 
strategy,  move#  and  plajrer  fr<«a  gfkm  theory  only  Uhb  ntanlng 
attached  formally  by  von  Neumann  and  Morgenstem  ^l"]  ;  for  other 
purposes  I  shall  use  alternative  words  such  as  situation,  plan, 
act,  and  sub/'^'t. 


The  problem 

The  games  In  which  we  shall  be  Interested  are  defined  In 
terms  of  expectation  functions: 


1  Ills 


.1 


n 


for  (1„  -  1,  2, 


»  2 , 


nj 


« 


Tt ’.s  process  has  been  most  systematized  for  the  art  of  decision 
by  military  commanders  [_B]  . 


A  play  of  the  game  consists  of  simultaneous  independent  choices 

of  specific  values  for  the  1^^  by  the  n  players;  the  quantity 

Is  the  expectation  for  player  J,  where  the  units  for  V*^ 

1  1 

relate  to  a  measure  of  the  utility  attached  by  player  J  to  the 

payments  he  receives.  The  functions  are  real-valued. 

The  actual  payment  to  plaver  J,  when  the  choice  of  pure 

strategies  Is  1  cn  a  play,  Is  a  quantity  x  given  by  the  dlstrl- 

< ,  !  II 

butlor  function'  x)  whose  mean  value  Is  V;/;  of  course  (x| 
lr>  bounded,  so  that 

pj(x)  -  0  If  ixi  >  b^  . 

Our  problem  Is  to  select  a  good  method  of  play  that  can  be  used 

by  player  i  when  his  Infonnatlon  about  the  structure  of  the  game 

Is  knowledge  only  of 

and  a  bound  max  bf  >  0, 

i  ^  1 

and  where  hlB  Information  about  the  distribution  functions  P^(x) 

Is  gained  entirely  from  his  experience  while  playing  the  game. 

It  Is  assumed.  In  the  process  of  passing  from  the  normal  to 
the  extended  form  of  a  game,  that  only  the  mean  values  of  these 
distributions  affect  the  situation  [l] ;  we  can  assume  without 
essential  loss  of  gene ratit^y,  therefore ,  that  the  variance  of 
P^(x}  Is  zero  so  that  x  only  assumcs'-th^.^lue  V^.  Furthermore, 
since  the  problem  Is  essentially  unchanged  If  the  utility  measure 
X  Is  subjected  to  a  linear  transformation  fl]  we  may  take  and 

suppose  that  P^{x;»0  If  x<0  or  x>l . 

The  exp>erience  gained  by  player  J  in  N  plays  of  the  game 
consists  of  a  record  of  hls  own  choices  lj(t),  and  receipts 


rj(t)  for  t«l,  2,  N,  The  central  problem  lo  to  find  a  rule 

of  play  that  will  tend  to  maximize  total  receipts  In  a  sequence 
of  plays  where  the  player  Is  given  some  Information  about  the 
number  of  plays  before  he  starts  on  a  sequence;  1  am  Intention¬ 
ally  vague  at  this  point  In  the  paper  about  the  exact  nature  of 
the  rule  of  play  and  about  the  advance  Information  concerning  tlte 
length  of  sequence. 

We  shall  be  Interested  in  what  follows,  then,  only  In  the 
game  whose  normal  form  has  the  expectation  funetlons 

0  <vl  <  1. 

We  shall  be  especially  concerned  with  one  extended  form  of  this 

game  In  which  there  Is  one  chance  move  for  each  player  and  the 

actual  payments  are  always  unlti  or  zero,  wt.ence  the  probability 

of  a  unit-payment  to  player  J  Is  If  the  players  choose  the 

▲ 

pure  strategies  1.  We  have  noted  that  any  game  can  be  reduced 
to  this  form  by  suitable  linear  transformations  on  the  utility 
veaaures  of  the  Individual  players,  provided  only  that  there  are 
known  finite  bounds  on  the  possible  payments;  use  wa:  also  made 
of  tho  assumption  that  games  In  extended  form  are  equivalent  If 
their  normal  forms  are  Identical. 

4.  The  game— learning  model 

The  type  of  rule  of  play,  that  is  Investigated  here,  is 
represented  for  player  J  by  the  relation 

for  k  -  0,  1,  2,  •••,  2t«j  , 

where  p^(t)  ...  -  vector  of  probabilities  and  the  are  n"' trices. 
#  Ik 

The  M*'  are  necessarily  stochastic  matrices  such  that  the  ele¬ 
ments  are  non-oegatlve  and.  In  any  column,  sum  tc  unity. 


-6- 


Thft  components  Pj(t),  for  1»0,  l,  2,  ,  are  the  probablll- 

tlea  that  player  J  selects  value  1  for  1^  on  play  t;  D^{t)  has  a 
special  significance  to  be  discussed  later.  The  elements 

^/O 

of  foro(.,/3m  I,  2,  '  ‘  ,  my  are  given  real  numbers.  The 

Jk 

probability  that  M''  is  applied  after  play  t  depends  only  upon 
p^(t),  and  certain  constants  to  be  discussed,  and  so  the  procedure 
is  a  Mar'eov  pracess. 

We  now  describe  how  the  appropriate  operat  x  Is  chosen 
after  play  t.  The  matrices  are  first  separated  Into  two 
classes  of  m^+l  members  each,  denoted  and  for  k-0,  1,  2, 
•••,  mj,  and  either  or  Is  selected;  the  choice 

between  these  two  matrices  Is  made  with  probabll*'  .y  ^^(t)  In  favor 
of 

We  have  now  defined  a  method  of  play  that  can  be  used  by  any 
player  after  he  has  made  his  Initial  strategic  cnolce  p^(0),  and 
after  specific  values  have  been  assigned  for  the  elements  of 
In  actual  practice  he  will  need  to  know  and  also. 

5*  The  fusion  game-learning  model 

We  shall  now  consider  a  special  parametric  fonr.  for  the 
Ik 

matrices  .  For  convenience,  we  shall  omit  the  designation  of 
the  player  when  this  leads  to  no  ambiguity. 

We  set: 

for  (l,«x,5-0,  2,  •••,  m),  where  a^,  b^,  is  d  o'"  are  In  the 

*  T 

Is  the  well-known  Kronecher  delta,  and  Is  unity  or  ’•ro 
according  as  and  B  are  or  are  not  equal. 


closed  interval  (0,  l).  This  special  form  of  the  more  general 
Markov  game—lear''lng  process  was  developed  by  Buah  and  Mosteller 
[^4c]  so  as  to  fit  data  obtained  In  s  number  of  learning  experiments 
with  rats;  they  have  named  it  the  "fusion  model." 

We  shall  be  Interested  In  the  case  In  which  all  but  one 
player,  say  number  I,  chocae  constant  strategies  ,  but  where 
player  1  uses  the  fusion  aodel;  this  means  that  the  probability 
that  player  J>1  will  select  the  value  1  for  Ij  la  p^  on  each  play, 
and  for  player  1  it  Is  p^(t).  Since  these  choices  are  all  Jiade 
Independently  It  follows  Immediately  that  the  expectation  lT 
player  1,  if  he  chooses  the  value  x  for  li  on  play  t,  Is; 


We  may  suppose,  wltnout  loss  of  generality,  that: 


•"  >  G-  > 

mj  -  mi-i  - 


It  follows  that  player  I  could  do  no  better  than  to  cnoose 


^  1  ^  ^  '  “  ^Irc  I 


80  that  hls  expectancy  on  each  riav  Is  G  ,  and  we  snail  be 

■  '  IT.  1 

Interested  In  comparing  hla  expectation  when  he  makes  use  of  the 
fusion  model  with  tnle  njaxlmum  possible  expectation  G  ;  of  course, 

mj 

hls  success  with  the  fusion  rr.odei  may  also  depend  upon  hls  choice 
of  the  •starting  vector 

4 

The  component  r*:;(t  la  Interpreted  In  the  game  situation  as 
the  probarlllty  tr.at  no  choice  will  made  by  player  J  at  time  t, 


and  thl:  la  called  "thinking."  This  feature  can  be  Introduced 
mathematically  Into  the  game  model  by  defining: 

VJl  -  If  any  component  of  1  la  zero, 
where  the  range  of  1.  haa  been  extended  to  Include  zero.  This 

K 

augmentation  of  the  original  gfune  problem  will  be  used  whenever 
the  fusion  game— learning  model  la  under  dlsouealon. 


6 .  A  apeclal  fusion  model 

We  stiSl''  now  consider  a  spec  la  i  Izatlon  of  the  f  tlor  model 
Ir.  wl. Ich  a^,  and  are  positive  and  independent  of  1;  we 
denote  ’■heir  common  values  a,  n ,  and  c.  We  shall  also  suppose 
that  tne  quantities  for  x««l ,  2,  •••,  m.  are  distinct  and  non¬ 
zero,  and  that  G  ^  =0 . 


t-f’  t 

The  expected  value  of  p  given  p  ,  la: 


1  w-0  ' 


This  can  oe  rewrltteri  to  yield  the  following  relation  for  the 
k^‘‘  co^p  t  'f  tne  expected  value  of 


(a-b-:  ' 


O'.  K 


\c-a 


G,  p,^ 

k^K 


r 

b  T 

*0 


p^G 
^  at 


t 


f ?r  k«»0,  1,2,  •  '  •  ,  m. 

It  follows  easily  that  a  vector  V‘  satisfies  the  equation 
..V=V  if  ar.d  only  If  it  Is  the  unit  vector  e^  or  has  the  following 


-9- 


3uih  and  Moi taller  [4b3  have  called  the  matrix  Q  the  "expected 
cparator”  and  they  have  made  use  of  the  vectors  V^*  In  discussing 
the  asya^) toils  behavior  of  p  . 

Rather  little  Is  known  concerning  this  asymptotic  behavior, 
and  sti]!  less  is  known  about  methods  for  estimating  the  para-> 
meters  in  the  fusion  model  from  experimental  data,  so  we  shall 
have  to  resort  to  Monte  Carlo  computational  methods  In  our 
exploration  of  the  properties  of  che  fusion  model.  For  this 
purpose,  we  shall  turn  to  some  numerical  examples. 

7,  A  rat  experiment 

The  rat  Su  must  choose  one  of  two  rooms.  Five  seconds  after 
a  warning  bell  the  rat  ia  either  rewarded  (fed)  or  punished 
(shocked)  by  the  experimenter  Ex,  and  Su  does  not  see  what  the 
result  would  have  been  had  Su  chosen  the  other  room  on  that  par¬ 
ticular  trial. 

Ex.  rewards  or  punishes  according  to  a  rule  prescribed  in 
advance.  The  experimental  situation  may  be  summarily  described 
as  a  two-person  game.  The  payoff  matrix  for  Su  is: 


Places 

- 1 

^ood  In:  ; 

j  Su 

Room  1 

Room  2 

i 

Sits  Room  1 

1 

-y 

Room  2 

—___J 

^  . 

At  the  moment,  1  am  not  Interested  In  the  payoff  matrix  for  Ex 
but  shall  simply  suppose  chat  Ex  has  chosen  a  strategy  (irt,  ^2) 
such  that  Ex  places  the  food  in  Room  1  lOQir^  per  cent  of  the  time. 


The  (ixpecttd  payoff  for  Su  Is  then  V{pi  )»Pi  (2Ti-l )  ) » 

where  y  Is  non-negative  and  pj^  denotes  the  proportion  of  the 
time  Su  Bits  In  Room  1.  Now  In  this  situation,  even  If  Su  had 
superior  human  intelligence,  game  theory  would  give  Su  little 
reel  help  In  choosing  a  strategy  because  there  la  no  meaning  to 
attach  to  the  notion  of  payoff  matrix  for  Ex;  even  If  there  were 
a  payoff  matrix  for  Ex,  the  game  would  probably  be  non-constant 
sum  and  the  value  of  y  would:  vary  frow  Su  to  Su,  vary  from 
time  to  time,  and  be  difficult  to  estimate.  Nevertheless,  a 
rat  or  a  human  found  tf^hls  situation  does  behave  In  some  fashion, 
and  our  scientific  problem  Is  to  explain  and  predict  actual 
behavior  as  well  as  possible. 

Before  turning  to  the  game— learning  theory  approach,  It  may 
be  Instructive  to  discuss  the  situation  In  the  usual  manner, 
from  the  standpoint  of  rationality.  For  example,  If  it 

does  not  matter  what  Su  does,  since  the  result  is  Independent  of 
his  choices.*  If  Tri'fl/2,  then  Su  should  choose  pi-1  or  ps«0 
according  as  Ti  >  l/2  or  tti  <  Ori  the  other  hand,  If  Su  feels 

that  Its  past  behavior  (Including  its  biological  characteristics) 
may  be  analyzed  intelligently  by  an  Ex  that  strives  to  minimize 
the  payoff  to  Su  by  choosing  a  time-dependent  strategy  in  tei'M 
of  past  behavior  of  Su,  then  Su  should  somehow  protect  against 
this  unwanted  result  by  concealing  its  pattern  of  behavior  from 


*  Perhaps  this  is  a  way  out  of  the  so-called  free-will  dllewna. 
It  leaves  lots  of  room  for  the  coniclous  conviction  of  choice 
with  no  requirement  that  main  trends  be  affected  thereby!  Prom 
a  game-theoretic  standpoint  this  corresponds  to  the  case  In 
which  all  available  pure  strategies  are  Included  In  a  solution. 


« 11- 


Ex  (p«rhap0  by  randonlzatlon  ai  proposed  In  the  theory  of  games). 
These  dynamic  eases  are  entirely  outside  the  scope  of  present 
fozval  game  theory,  of  course#  and  are  the  principal  cases  of 
interest  hero. 

The  basic  assumption  from  learning  theory  Is  that  Su  varies 
its  behavior  according  to  the  pattern  of  its  past  experience. 

The  special  sHtthesmtical  fora  assumed  here  for  this  effect  Is 

■c  ^  11 

the  special  fusion  model  of  j  6#  with  m-'2.  The  matrices  R' 

li 

and  P  are#  therefore: 


1 

R  is  used  after  Room  1  Is  chosen  when  the  food  was  placed  there# 
and  Is  used  after  Room  1  Is  chosen  when  the  food  was  '^ot  placed 
there.  Of  course#  Ex,  as  player  2,  may  be  represented  by  the 


-12- 


The  eomrespond  eztetly  to  the  of  ^  3*  In  thli  oas«»  Ted 

HarrlB  haa  shown  [Appendix  aJ  that  the  aayaptotlc  value  of 

P*(t)  la  /I  \ 

e^  -  0  If  0  <  Tt  <  1 

\ol 

whatever  the  value  of  pMo);  Indeed,  the  probability  la  one  that 
there  will  eventually  be  an  unbroken  aequenee  of  applications 
of  that  temlnates  the  process. 

Monte  Carlo  computations  were  made  for  this  model  with 
numerical  values  for  the  parameters,  chosen  In  agreement  with 
estimates  by  Bush  and  Mosteller  [Ac]  on  the  basis  of  data  from 
learning  experiments  with  rats,  as  follows: 

a*b«d«0.01,  C"K).10. 

These  computations  were  made  for  a  rather  careless  assortment  of 
values  for  pMO)  and  vi,  and  the  main  results  are  shown  In 
Tables  lA-10.  All  the  computed  cases  show  a  strong  tendency  for 
to  seek  an  equilibrium  near  the  value  0.1,  and  It  Is  Interest¬ 
ing  also  that  pa  seemed  always  to  go  to  sero  when  ti  >  0.3l  this 
constitutes  a  tendency  toward  optimal  behavior  since  only  choice 
of  Rooms  1  and  2  represent  actual  decisions  by  the  rat  under  our 
interpretation  of  the  fusion  model.  There  Is  not  enough  data  In 
Table  I  to  permit  any  real  analysis  of  the  moments  of  the  dlstrl— 


u 

butlon  of  p  . 


TABLE  1 


StftWrat  Strategies 


lA?  Tt-0.5 


Ti-0.5 


6 


0 

900 

50 

50 

5 

895 

58 

47 

10 

900 

53 

46 

50 

900 

52 

48 

60 

692 

281 

25 

90 

J _ _ 

355 

374 

271 

_ 

L93 

472 

- 1 

335 

Lll 

589 

299 

L39 

722 

.39 

i  Po 

Pi 

Ps 

0 

500 

500 

19 

579 

402 

46^- 

617  V 

337  t 

8Si  : 

jfljS 

jRV-2' 

106 

■$0^- 

97 

869 

^3 

97 

897 

4 

t 

Po 

Pi 

Ps 

0 

300 

100 

100 

5 

717 

196 

87 

10 

717 

202 

80 

30 

398 

337 

265 

60 

337 

497 

165 

90 

CO 

606 

247 

120 

116 

815 

69 

150 

98 

B93 

9 

180 

115 

623 

262 

210 

98 

371 

31 

240 

1  88 

906 

6 

-14- 


TABLE  1  (Continued) 


IE;  Tri«0.33 


t 

Po 

Pi 

Ps 

0 

800 

100 

100 

5 

721 

193 

86 

656 

262 

81 

50 

4ll 

439 

151 

60 

239 

115 

45 

90 

118 

875 

7 

120 

98 

901 

1 

IG;  Ti-1.0 

I  I  I  ■■■■■■»  I 


B 

‘’9 

Pi 

Ps 

0 

114 

100 

786 

5 

to9 

96 

795 

10 

104 

179 

715 

30 

107 

716 

177 

60 

99 

892 

8 

90 

j 

107 

892 

1 

IF:  7rj»0,9 


n 

Po 

Pi 

Ps 

0 

900 

100 

0 

5 

735 

265 

0 

10 

744 

256 

0 

30 

593 

406 

0 

60 

126 

874 

0 

90 

106 

894 

0 

120 

90 

908 

0 

150 

111 

889 

0 

180 

1 _ 

92 

908 

0 

101 


902 

898 


0 


-13- 


8.  Human  BubJectB ■ 

All  that  has  baan  said  about  the  game— learning  model  Is 
applicable  in  the  analysis  of  experimental  data  with  human  sub¬ 
jects.  There  may  be  a  considerable  advantage  in  using  human  sub¬ 
jects  since  the  conditions  of  the  experiment  can  be  explained 
to  them  easily,  and  because  their  choices  are  made  quite  rapidly. 
Some  very  tentative  trials  were  run  in  order  to  gain  some  experi¬ 
ence  with  the  experimental  situation,  as  a  first  step  toward  a 
design  of  more  conclusive  trials. 

In  the  first  series  the  subject  Su  was  asked  to  call  "head" 
or  "tail"  in  an  attempt  to  match  the  random  holce  made  by  Ex 
with  fixed  probability  Xi .  The  success  of  Su  was  compared  with 
that  of  the  special  fusion  model  (stat— rat)  used  in  ^7,  where 
random  numbers  were  used  to  yield  Tri->0.55.  The  number  of  trials 
was  too  small  to  permit  any  quantitative  conclusion  to  be  drawn, 
but  it  seemed  likely  that  a  more  extensive  series  of  trials  would 
be  worthwhile.  The  scheme  was  discontinued  in  favor  of  more 
promising  ones  to  be  discussed  next. 

The  general  zero-sum  syTnir»etrlc  game,  which  has  no  pure 
BtratSiSy  as  a  solution,  is  represented  by  a  three-parameter 
payoff  matrix  for  its  normal  form: 


V 


I  0  u  -V  \ 
-u  0  W  I 

I 

i  V  -w  0  / 


where  u,  v,  and  w  are  positive.  Tlie  solution  of  this  game  is 
the  unique  mixed  strategy: 

1 


U4-V+W 


(W,  V,  U) . 


-16- 


Thr<de  trials  ware  made  in  which  s  subject  played  a  3x3  zero-«u* 
symmetric  game  against  the  stat-rat,  as  defined  numerically  for 
the  special  fusion  model  In  ^7,  idiere: 

(a)  The  absolute  values  for  u,  v»  and  w  were  taken 
directly  from  a  table  of  random  numbers, 

(b)  The  subject  was  not  told  the  payoff  matrix  but  was 


,  told  the  exact  method  used  to  select  It, 

subject  Mas  told  that  he  ia«  playing  acalaat  4^ 


rat  In  mathematical  form. 


The  three  payoff  functions  were: 


These  three  games  each  have  a  pure  strategy  for  a  solution,  as 
shown  in  the  final  column  of  the  table  Just  above.  The  results 
of  the  three  trials  are  summarized  In  Table  2.  The  tr«nd  In  the 
stat— rat's  mixed  strategy  Is  shown  for  each  game  In  Table  2,  along 
with  an  estimate  of  the  mixed  strategy  in  use  by  the  subject 
based  on  the  average  of  his  ten  choices  centered  at  the  play 
listed.  Again  the  data  are  too  few  to  Justify  careful  analysis, 
or  quantitative  conclusions,  and  this  type  of  trial  was  discon¬ 
tinued  In  favor  of  a  more  promising  one  to  be  discussed  next. 


TABLE  2 


3x3  SynpMtrlc  Games 


No.  of  plays  19 
No.  of  wins  6 
Peroentags  wins  32 


RF 

Stat— rat 

1  KD 

Stmt->r«t 

KF 

19 

19 

f  ' 

1 

25 

25 

20 

6 

13 

20 

10 

32 

1 

1 _ 

68 

20 

60 

50 

Preauency  of 

Use  0 

f  Solution 

For  the  stat— rat  the  frequency  through  play  x  la  coiTr- 
puted  aa  the  ratioof 

p*  ,  4.  (1-p*), 


and  for  the  subjects  it  Is  one-tentti  the  number  of 
wins  In  the  ten  plays  centered  at  play  x. 


-18- 


There  are  really  two  rather  different  typea  of  problems 
Involved  thus  far  In  our  discussions  of  game— learning  situations 
In  which  the  payoff  functions  are  unknown: 

(a)  (Static)  Those  situations  In  Which  it  Is  assxuaed  that 
the  opponents  of  the  main  player  choose  and  use  a 
fixed  mixed  strategy  for  the  duration  of  a  sequence 
of  plays, 

(b)  (Dynamic )  Those  situations  In  which  It  Is  assusMd  that 
the  opponents  of  the  main  player  may  vary  their  strategic 
behavior  during  the  sequence  In  a  manner  that  somehow 
takes  account  of  the  results  they  obtain  on  et  rller 
plays . 

ITie  static  case,  with  known  payoff  functions.  Is  the  usual  one 
considered  [i]  whereas  our  Interest  centers  here  on  the  dynamic 
case  with  essentially  unknown  payoff  functions.  The  static  ease 
always  reduces  tf^  one  In  wlilch  there  Is  a  set  of  numbers  in 
the  closed  Interval  (0,  1),  that  represents  the  payment  expecta¬ 
tion  If  our  main  player  chooses  pure  strategy  x  on  a  given  play — 
and  the  numbers  remain  constant  for  the  sequence  of  plays;  the 
dynamic  case  takes  the  same  form  except  that  the  numbers  0^^  B«y 
vary  In  some  manner  that  Is  oependent  upon  the  choices  made  by 
our  main  player  In  preceding  plays.  The  game— learning  model  la 
equally  applicable  In  either  the  static  or  the  dynamic  case. 

The  game  of  Morra  is  a  convenient  one  for  our  purposes  both 
because  It  is  jT  handy  size  (9*9)  snd  because  It  has  been  com¬ 
pletely  solved.*  I"]  The  static  case  was  examined  experimentally 

Morra  Is  discussed  in  Appendix  B. 


-1^ 


for  Morr*  by  having  two  subjects  and  the  stat-rat  play  the  gane 
Icnowlng  only  that  It  was  9x9  and  syrsmetrlc,  and  that  their  oppo¬ 
nent  would  not  be  using  a  gsaie—theoretlc  solution.  Subject  BC 
has  no  knowledge  of  game  theory  and  Subject  PIB  Is  a  mathematician 
who  Is  exF>«rt  In  game  and  decision  theory.  Actually,  a  fixed  pure 
strategy,  not  In  the  solution  mixture,  was  used  in  opposing  the 
subjects  and  the  stat-rat;  it  Is  represented  by  the  following  set 
of  values  for  0^  that  were  used  against  BC  and  the  s tat— rat, 
those  for  RB  being  2/5  as  great: 

Ojj  -(.500,  .500,  .855,  .250  ,  250,  .500,  .500,  .500,  1.000). 

The  results  of  play  are  suiasarlzed  In  Table  5-  Th®  data  are  still 
too  skimpy  to  permit  any  conclusions  to  be  drawn.  There  Is  no 
particular  reason.  In  the  static  case  at  least,  why  the  experi¬ 
mental  values  chosen  for  0^^  should  come  from  a  game  that  has  a 
well-known  extended  form.  Consequently,  we  have  come  to  the 
following  type  of  experiment  as  the  most  promlalng  one  to  use  In 
obtaining  data  on  the  behavior  of  humar  subjects  to  be  used  In 
estimating  parameters  In  the  fusion  models  and  thus  eventually  to 
test  the  hypothesis  tnat  this  mathematical  model  represents  human 
learning  behavior.  The  0^  are  chosen  from  a  rar dom-number  taole 
and  the  subject  then  Is  asked  to  clay  a  number  of  times  fix  ’  Ir 
advance  In  an  effort  to  maximize  his  total  numoer  of  wins;  Rb 
and  AM,  botn  experts  In  the  relevant  mathematical  theories,  served 
as  subjectp  for  trials  In  which  there  were  1,00(>  plays  and: 

0^  -  (.C4,  .05,  .25,  .'^•1,  .b4,  .55,  .44  ,  .44,  .75, 


-20- 


No.  of  plays  67 
No.  of  wins  24 
Pareantaga  wins  40 

“irough  Pla 


RB 

- 1 

BC 

- -  -  "1 

Stat-rat  1 

Stat-rat  2 

67* 

372* 

29 

43 

24 

213 

14 

21 

40 

67 

48 

49 

Praauancy  of  Uaa  of  Solution 


.100 

.100 

.077 

.186 

.058 

.155 

.030 

.260 

.oi: 

.304 

The  eurject  a:  nouncad  at  thlb  point  that  '.e  would  continue 
playing  pure  strategy  N'o.  9  Indefinitely,  and  eo  nad  In  affaet 
"solved”  the  gajM .  R?-  oade  thl?  tentative  decision  after  32 

playp  and  Jsed  the  otner  3‘  slmrly  to  c'nflr«  nip  decision. 


Th«  results  are  given  In  Table  4;  it  aeeaw  unlikely  that  the  stat— 
rat  would  natch  this  perfomance,  but  It  would  probably  take  a 
good  many  trials  to  give  a  statistically  significant  test  of 
this  conjecture. 

One  statlc-<nlne  gane  has  been  played  by  the  stat-rat  with 
ten  replications.  In  this  play,  pS  »  0.1  for  c<*  0.1,  •••,  9,  and 


iUMi  tifi 


mfw  m 


U,*-  f.0'97,  .5to;  .>133.  .274,  .442,  .364.  .5Q3</.S29.  -a56J 

'I-  ^ 


I^Lci  mm  lit  fir«l  tm  aaihlai  r«a  mmmimitldm  were 

dgrfted  td*tght  dadaAl  places  Tor  ^o  hundred  steps  each.  In 

^00 

nine  out  of  the  ^fii' oases ,  the  value  of  p  was  essentially  such 
•0.9.  0.1;  in  other  words,  the  stat-rat  had  reached 


that 


the  optliiiun  stratiBg^  In  200  trials.  In  the  tenth  case  the  value 
of  if  rounded  off  at  the  third  decimal  place,  was  essentlall 

p^°®-0.1;  In  other  words,  the  stat-rat  had  reached  a 
very  poor  strategy  1..  200  trials.  Other  details  of  this  run  of 
static-nine  are  given  in  Table  5>  Including  the  'winning  rate" 
w^  which  represents  the  expectation  on  the  first  decision  after 


time  t,  where: 


I  G 
L  p -  Gj 

1-1  ^ 


Z  p, 

1-1  • 


ITie  dynamic  case  Is  perhaps  the  most  Interesting  one  experi¬ 
mentally,  especially  where  a  fusion  model  le  pitted  against  a 


TABLE  4 


StJitic— ICMiiiJW 


RB 

AM 

No.  Of  plays 

1,CX)0 

1,000 

« 

No.  of  wins 

719 

715 

Percentage  wins 

71.9 

71.5 

No.  of  last  play 
befc  deciding 
pemanently  on 
strategy  9 

133 

204 

« 


Based  on  expectation  after  decision 
to  play  strategy  9 


3- 


Y 

TABLE  5 

-  1 

Static— 9  Game  ,  j, 


Average 
excl . 

1 

Oajne  Number 

- 1 

Item 

Game  9 

1 

1 

2 

3 

B 

5 

6 

7 

CO 

_ 

9 

10 

Time  at  which 
winning  rate 
first  exceeded: 

.84 

i 

I 

59 

1 

29 

58 

88 

98 

■ 

72 

58 

j 

36 

.90 

81 

41 

88 

99 

111 

76  ■ 

92 

68 

— 

36  j 

Winning  rate 
at  time ; 

’  50 

70 

- 1 

87 

84 

i 

73 

44 

58 

62 

59 

74 

36 

92 

100 

91 

90 

93 

92 

90 

82 

93 

91 

93 

36 

200 

93 

93 

93 

93 

93 

93 

93 

93 

93 

36 

93  1 

Percentage  wins 
in: 

100  decisions 

63 

85 

72 

57 

57 

75 

65 

75 

38 

86  j 

200  decisions 

82 

80 

89 

83 

75 

75 

83 

80 

83 

9J 

„100 

Po 

.100 

.106 

.091 

.092 

.  106 

.101 

.093 

.  104 

.095 

i 

1 

i 

200 

^o 

1 

.097  : 

1 

. .  ,  1 

.090 

.100 

.092 

.  105 

.105 

.100 

.092 

.093 

.0981 

No.  of  thinking 
steps  for  200 

1 

23  i 

23 

25 

■ 

29 

27 

18 

22 

21 

14 

1 

29 

decisions: 

.  1 

! 

Hi 

■ 

i 


-24- 


human  subject.  Before  2olnp  too  far  with  such  a  program.  It  will 
be  necessary  to  develop  a  better  7T>at.hematlcal  understanding  of  the 
models  In  order  to  design  the  experiments  so  as  to  permit  sta¬ 
tistical  significance  testa  to  be  applied;  this  point  has  been 
discussed  oy  Bush  and  Mosteller  [4b],  and  they  and  others  are 
gradually  developing  some  of  the  matnematlcal  tools  that  are 


needed.  We  have  some  of  these  experl* 


with 


subjects,  uslLng  Morre  and  other  games  bf  itbOOt  this 

the  purpose.  It  would  be  Interesting  »1»0  to  rtil  son# 

exactly  the  same  sort  with  rea]  rats  {e.g. ,  playing  Korra  against 

tho  stav-rat). 


We  have  seen  how  the  stat— rat  la  able  to  play  any  game  with 
Icnown  bounds  on  the  payments,  even  though  we  have  not  been  able 
to  settle  the  question  concerning  Its  degree  of  slclll  at  games. 

We  shall  now  be  Interested  In  how  best  to  exploit  this  knowledge  of 


the  procedure  used  by  the  stat— rat  In  playing  games  when  we  are 
Its  opponent.  Of  course,  if  we  know  the  expectation  functions 
and  have  computed  the  solution  of  the  game,  then  we  can  guarantee 
at  least  a  certain  minimum  result  by  choosing  the  game— theoretic 
solution;  our  object  Is  to  do  better  than  this  safe  solution 
guarantees,  and  we  also  should  like  to  know  how  to  play  when  we 
do  not  know  the  expectation  functions  or  the  theoretical  solution. 

As  a  special  case,  consider  the  ordinary  game  of  matching 
pennies.  We  atart  with  the  reasonable  assumption  that  the  Initial 


vector  for  the  stat— rat  is: 

p‘(0) 


-25- 


To  nak*  the  game  quite  definite.  In  our  usual  notation,  we  note 
that  the  game  le  uaually  represented  by  the  expectation  functions: 

7}  .  •  2  .  -1 ,  and  ft  .  ■  i  -  2  ^4  , 

Ills  Ills  »tlt  Ills 


We  transform  this  game  Into  an  equivalent  one,  in  the  sense  that 
linear  transformations  on  the  lndlvld>ial  utility  functions  leave 
the  solutions  invariant,  by  setting: 


I 


+1) 


E 


111.’ 


and 


v« 

lilt 


No  chance  move  la  really  needed.  In  this  special  case,  since  the 
values  of  are  all  zero  or  one.  Finally,  we  specify  that 

there  Is  to  be  a  sequence  of  N  plays,  and  our  problem  Is  to  choose 
a  method  of  play  that  will  maximize  our  expected  payments  against 
the  stat— rat.  Since  we  can  compute  the  p*  [t)  for  the  stat— rat  at 
each  stage,  except  for  the  thinking  steps  when  p^  has  effect, 

It  Is  not  difficult  to  ^Ind  a  method  of  play  that  gives  us  an 
average  expectation  In  excess  of  that  obtained  if  we  play  stra¬ 
tegies  1  and  2  wltb  equal  frequencies.  Such  a  good  strategy  would 
be  for  us  always  to  play  the  strategy  that  Is  less  likely  to  be 
chosen  by  the  stat— rat.  I  shall  not  pursue  this  very  simple 
example  further,  except  to  note  that  It  becomes  Immediately  more 
difficult  If  we  do  not  know  p‘(0)  If  the  expectation  functions 
are  represented  In  the  equivalent  form: 


where  0  <  ^  <  1 . 


-56- 


10.  k  modified  fusion  aodsl 


It  is  Inoonvsnlsnt  to  htT*  the  situation  wmt  In  the  spaelal 


fusion  nodal,  i«harts~1tfnr«  la  mt' 


bounded  away  froa 
tually  be  applied 


□ 


••  that  tbe 


lir  viU 


Mrnami> 


culty,  and  sone  others,  we  shall  consider  the  following  alight 
modification  In  the  special  fuslf^n  rodel: 

(a)  The  operator  Is  never  applied. 

(b)  The  probability  of  selecting  pure  strategy  1  at  tine  t 

Is  t  t 

The  expected  value  of  the  paynient  on  oan' trial,  starling 


with  p  ,  is; 


s,(p 


®  t 


»)  .  ,  «h,r.  o5  . 


s  p:; 
J-1  ^ 


We  shall  now  be  Interebted  In  comparing  this  expectation  on  the 
first  trial  with  the  corresponding  expectation  St(p^)  on  the 
second  trial.  For  this,  we  have; 

Ss(p‘)  -  ?  Pj  Oj  S,(fV)  +  P^d-Oj)  8,(fV) 

If  e^  denotes  the  column  vector  with  unity  as  Its  o<  compo¬ 
nent  and  r.eros  elsewhere,  and  where  a  prime  is  always  used  to 
denote  the  transpose  of  a  matrix  (or  vector),  then: 


X 


-27- 


S,(rV) 


X  p‘  0 

o<,-l  ^ 


m 

Z 


1 


«-i<  ”  p 


it  Ho] 

Ji  hJ  *  ’=^=<  1  *  ‘>io] 


and 

Si(pS*^) 


fjAt  ♦  aG^ 

uB  +  a 


rhi  ■♦■  oO| 

rB  +  0 


where  r-l-bi-c  and  So 


s,(pM  .  -i- 1  ("pJ  Ot  +  pj(i-^t 


r  uA  1  +  aQi 


+  a 


and,  after  reduction,  this  becomes 


s,(ph 


A 

-ff*-  + 


(ra-uc ) (Af-BAa } 
B(r3+c)(uEM-a' 


The  quantities  in  the  denoainator  are  all  positive  and  so  the 
algebraic  sign  of  the  second  term  depends  on  Its  numerator  only. 


in  whAch  (ra-uc)  depends  only  on  the  constant  parameters.  We  note 


•  p4p4  (g.-qJ*  ^  0. 
KJ  ‘  ^  ^  ^ 


It  follows  that 

Ss(p^)  ^  Si(p^)  if  (rm-uo)  <  0. 


-28^ 


This  Is  an  inportant  featur«  for  our  gamt~l«amlng  modtl  to 
have  In  ordar  that  th«  expectation  Increase  with  each  play  in 
the  static  case. 

It  is  obvious  that 

S.(P*)  <  G,. 

It  Is  likely  that  can  converge  asymptotically  only  to  a  vector 
of  the  form  ■  Oe.  +  (l-^le.,  where  0  <  ^  <  1.  One  Important 

OK  Q  mmm 

unsolved  pi^Jblem  is  to  find  the  probability  that  p^  converge  to 
V  when  the  Initial  vector  p  Is  given;  It  Is  reasonable  to  hope 
that  the  pI^)bablllty  Is  high  that  p^  converges  to  V®  when  p®*l/Wl, 
In  which  case  the  modified  special  fusion  model  represents  a  game— 
leai*nlng  process  that  tends  asymptotically  to  find  the  optimal 
pure  strategy  In  playing  a  static  game.  If  these  conjectures 
prove  to  be  well  founded,  as  our  Monte  Carlo  Computations  with  the 
special  fusion  model  seem  to  Indicate  they  may  be,  there  will 
still  be  some  questions  In  the  degenerate  cases  In  which  the  Oj^ 
arc  not  all  distinct  or  seme  of  the  p^  are  taken  to  be  zero. 

11 .  Summary 

This  Is  a  very  preliminary  paper,  in  It  we  have  shown  how 
a  player  can  "learn"  during  the  course  of  a  sequence  of  plays  of 
a  game  to  Improve  his  strategy'.  The  fusion  modtl  developed  by 
Bush  and  Mosteller  to  explain  observed  behavior  of  liitt  in  exp)eri— 
mental  learning  situations  was  used  as  the  basis  fer  bs^h  a  theo¬ 
retical  and  experimental  investigation  of  the  efficiency  of  thla 
type  of  learning  process  In  learning  to  play  games ;  the  experi¬ 
ments  discussed  here  were  with  human  subjects,  and  their  game- 
learning  performance  was  compared  with  that  of  the  "stat-rat* 


-29“ 


r«pr«9«nt«d  by  th«  fusion  aodsl  with  nuMrleal  valuss  of  ths 
paramttsrs  astlisatad  to  fit  axparlMatal  data  for  rats. 

Ths  th«or«tloal  wodala  aoeept  basic  assimptlons  of  vonNei  '-ann- 
Morganstem  gaae  thtory  and  Bu8h-.Mos taller  learning  theory, 
Including: 

(a)  Oanes  with  Identical  nonsal  foms  are  equivalent,  and 
this  equivalence  Is  independent  of  the  probability  distribution 
functions  associated  with  chance  moves. 

(b)  Games  that  differ  only  by  linear  transformations  of  the 
Individual  payoff  functions  are  equivalent. 

(c)  Learning  Is  a  Markov  process. 

Equivalence  here  means  that  the  gaises  have  the  same  solutions. 

The  experimental  results  consist  of  Monte  Carlo  computations 
for  the  stat-rat,  contests  between  stat-rat  and  a  human  subject, 
and  ooEcparlsons  of  performance  of  stat-rat  and  a  huaym  subject 
when  playing  the  sasm  static  game.  Very  limited  data  Indicate 
that: 

(a)  The  stat— rat  usually  learns  a  good  strategy  when  a  con¬ 
stant  mlxed-etrategy  is  played  against  him;  Ir'  Morrn  and  the  other 
games  played  the  stat-rat  seemed  to  settle  on  eseentlally  the  best 
strategy  within  200  trials  or  so. 

(b)  A  person  proficient  at  ^arnes  would  win  against  the  stat — 
rat  In  Morrm. 

(c)  The  stat-rat  does  ressoiiably  well  in  s  ststlc  game.  In 
comparison  with  the  human  subject,  but  a  statistloisn  would  cer¬ 
tainly  defeat  the  stat-rat- 


-50- 


The  theoretical  results  are  very  skimpy,  the  main  result  being 
that  one  modified  fusion  sodel  does  have  a  non-deoreaslng  expec¬ 
tancy  per  trial  on  successive  plays  of  the  game;  varlou;*  open 
mathematical  questions  are  noted. 

More  extensive  experiments  are  In  progress,  and  it  is  hoped 
that  these  may  provide  the  data  necessary  to  estimate  perameter 
values  for  hximan  subjects  and  eventually  to  test  the  adequacy  of 
this  type  of  Markov  process  for  description  of  human  learning.  It 
seems  very  unlikely  now  that  such  a  Markov  process  will  be  adequate. 


3 


->31- 


APPENDIX  A 
An  Asymptotic  Case 


Bush  and  Moatallsr  [40]  have  proposed  t  mathematical  model 
for  learning  that  fits  experimental  data  for  rats  quite  well; 
they  have  called  this  the  "fusion  model.'’  I  shall  dlaeusa  only 
the  special  tnree— choice  case;  the  argumerit  ^s  easily  exter.ded 
to  their  general  case. 

Define  five  matrices  as  follows: 


; 


i 


/l-h 

1 

c 

e 

^  \ 

M‘  - 

1  0 

1-b-C 

\ 

0 

i 

0 

!  fc 

1 

1 

b 

1-c 

1 

,  1 

a 

a 

\ 

V  ^  c 

'  0 

1  -« 

] 

\ 

t 

1 

i 

^  0 

1-e 

ti 

1-d 


\ 

0 


M"  ■  C  l~d  ! 

i 

I  .1  d  ’  . 

Consider  the  ’'Srk'iv  chaU,  n 
the  transition  matrix  Is  applied  at 
and  where: 


Where  the  probahillty  t -tat 
step  •  >  Is  q^{p^'  , 


t 

li 


t 

1  P  <  f 


•  TaPe 


t 

)  Pc 


1:.  • 


t 

P3 


f  !  » 


Tne  quantities  a,  n,  c,  d 


and  Ty  are 


Ive'  real  numbers  In  the 


-32- 


open  Interval  (0,1).  The  are  tne  three  components  of  the 
vector  r^,  and  satisfy  the  conditions: 


0  <  p?  £  1,  I  pj 
^  1-1 


If  we  let  denote  the  probability  that  the  matrix  Is 

applied  after  all  t  >  n  then  we  say  that  the  process  "eencludea 
with  the  matrix  M®', "  If  11m  u  (M^)  -  1. 


n — 4  00 


Theorem  1 . 

The  process  •  N^p^  concludes  with  the  matrix 

Proof:  The  probability  that  some  one  cf  the  three  operators 

M* ,  or  will  be  applied  after  play  t  is: 

t  t  t  t 

r  ■  T|Pi  +  T*pa  4  p,  >  h  >  G. 


where 


n  -  min 


T 

’^*1  . 


The  ope^tors  M  ,  M* ,  and  M',  when  applied  to  p  ,  yield 

components  pj  *  as  follows: 


:  p 


3  -  (l— b— c)p3  b, 


:  r  s"*”  ^  ■  ( 1  — b— c )  p  3  +  , 


( ’  ^ )  p,  +  d . 


;hus,  in  all  three  cases,  we  nave; 

03  ‘  ^  min  <;b,  I-k:,  d> 


A  >  G. 


Th.ls  theorem  an  1  Its  proof  are  due  tc  Ted  Harris*  (oral  cosuaunl- 
cat  ion  K 


-35- 


Th®  prooabllity  that  operator  la  applied  after 


times  t,  t4l,  •••,  t+n~l ,  is: 


^-(ps)  “Pa  |’.-(l-Pa)(l-d)*' 

"  1-1  L 


To  establish  this  relatlonahip,  note  that  repeated  application 

of  M  for  n  times  inmedlately  after  time  t  yields  the  relation; 

t+n  n^t  .  n\ 

P3  •  u  Pa  d-  (l~u  ) 

where 


—  1*^  • 

K, 

Next,  note  that  the  probability  that  occurs  at  least  n  times 
Immediately  after  tii4e  t  is,  as  required: 

f  .(ph  .  pr^ . 


We  will  be  interested  in  the  limiting  case  as  n - co  ,  and  next 

consider* 


f(ph 


11m 

n-—- 00 


It  follows  tnat 

>  f„(x),  If  ,4  ^x. 

since  each  term  In  th«  product  fy^(x)  is  less  than  cr  equal  to 
the  corresponding  term  in  the  product  f^(p3)*  1-  particular, 

f(P3)  ^  f(X)  if  was  obtained  by  application  of  H*,  ,  or 

It  is  easily  seen  that  the  seruence  fj^(A)  la  convergent,  and  so 

1  ^  f(A)  >  0. 


-3^ 


We  now  have  shown  that  the  probability  g{p^)  that  there 
will  be  an  unbroken  sequence  of  applications  of  after  time  t 
Is  net  less  than 

g  =  hf(>.)  >  0. 

We  next  obtain  the  limiting  value  of  the  probability  that  there 

will  be  an  unbroken  sequence  of  applications  of  starting 

somewhere  within  n  steps  aftirr  t*0. 

For  convenience,  let  B  be  the  generic  name  for  the  matrix 

operator  on  p  .  Let  ti  <  t#  <  ta  < .  '^^(N)  ^ 

values  of  t  <  N  for  which  and  the  process  does 

or  does  not  conclude  with  according  as  (p»  lim  (p(N)  it  or 

N— — ^  oo 

is  not  finite,  and  we  say  that  the  process  hag  (p  "breaks."  If 
we  let  P(b,vip®)  be  the  probability  that  the  process  has  at  least 
b  breaks  and  also  ^las  p  «»v,  and  if  we  let  P(bip  )  be  the  pro¬ 
bability  that  the  process  has  at  least  b  breaks,  then; 

P(b+lip^}  -  Z  P(b,T!p°)(l-P(0|v)). 

Now,  alnoe 

r  P(b,vip^)  «  P(b!p‘^)  and  F(Oiv)  >  g  >  0, 

It  follows  that 

P(b-»-l!p^)  <  (l-g)P(bip'^). 

Hence  lim  P(bip^)  r*  0,  and  the  theorem  follows. 

V)—— i  00 


-35- 


APPENDIX  B 

Morr> 

« 

1.  Game  of  "Morra" 

"Morra*'  is  an  example  of  a  game  Involving  only  pe'^sonai 
moves.  Each  player  shows  one,  two,  or  three  finger*^  and  simul¬ 
taneously  calls  his  guess  as  to  the  number  of  fingeirs  his  oppo¬ 
nent  will  show.  If  .lust  one  placer  guesses  correctly,  ne  wins 
an  amount  equal  to  the  sum  of  the  fingers  shown  by  himself  and 
his  opponent;  otherwise  the  game  is  a  draw. 

This  game  consists  of  one  move  for  each  player,  strategy 
for  each  player  is  a  pair  of  numbers  (s,g),  where  a— 1,2,3  la  the 
number  of  fingers  he  shows  and  g-1,2,3  la  his  guess  of  the  number 
of  fingers  his  opponent  will  show.  It  is  evident  hat  each  player 
nas  nine  strategies,  and  thus  there  are  8t  different  possible 
plays  of  the  game.  With  each  of  these  8l  ways  of  playing  the 
game,  there  is  associated  a  payment  to  the  players,  as  described 
by  the  rules  of  the  game.  These  payments  are  summarized  by  a 
payoff  matrix.  In  the  following  payoff  matrix,  the  entries 
represent  payments  to  player  I.  Player  II  will  receive  the  nega¬ 
tive  of  these  payments. 


36- 


"Korra" 


RECEIPTS* OP 

PIAYER 

I 

Player  11 

's  strategies 

(I,- 

1) (1.2) (1,3) (2,1) (2, 2) (2, 3) (3,1) (3,2) (3.3) 

V 

(1.1) 

0 

2 

2 

-3 

0 

n 

-4 

0 

0 

(1.2) 

-2 

0 

0 

0 

3 

3 

-4 

0 

0 

(l.J) 

-2 

0 

0 

-3 

0 

0 

0 

4 

4 

(2,1) 

0 

3 

0 

-4 

0 

0 

0 

Player  I’s 

(2,2) 

0 

-3 

0 

ti 

0 

r 

0 

-5 

0 

Strategies 

(2.3)! 

0 

-3 

0 

0 

-4 

0 

5 

0 

5 

(S.!)j 

4 

4 

0 

0 

0 

0 

0 

-6 

(3,2) 

0 

0 

-4 

5 

5 

0 

0 

0 

-6 

(3,3) 

0 

0 

-4 

0 

0 

> 

6 

6 

0 

2 .  SoUtttiona . 

The  four  Mialc  solutions  of  Morra  are: 


(«)  i 

1 

0, 

0, 

0, 

4 

12  * 

3 

12  ’ 

0, 

0 

(b)  ! 

i 

0, 

0, 

16 

0, 

12 

0, 

9 

0. 

0 

32 

32 

32 

(c)  1 

0, 

0, 

20 

Ii7  ' 

0, 

15 

w 

0, 

12 

47  ' 

0, 

00  j 

(d) 

C, 

0 

25 

Ti** 

0, 

20 

6l  ' 

0, 

VO  I 

0, 

0 

The  optimal  strategy  is: 


(a) 

0,  0, 

0 

o 

o 

o 

35 

5^- 

55  ” 

If  playor  I  uaas  tha  optlwal  atrat#^  (•),  ha  can  axpact  to  gain 
at  laaat  2/^3  whanavar  playar  II  daparts  from  atrataglaa  3,  5,  or 


b.  RolMirt  P.  Inf  ration  Prqpa»» 

Add  1b or.  Wesley  Pre'as,  1*5^:)']  ,  ' 


l^b 


^  ii 


fl . 


i  i'JBh , 


h.  R.,  and  ^lostelier 

Tor*  1  tr  f-v  “i  A  Tam  ^  ^  ^ 

teniber,  Ip,)!. 


C.  F.  "a  Mathematical  Model 
Psych .  Rev .  *  (^8,  3)  tiep— 


- - .  "a  Linear  Operator 

Model  for  Learning, ”  a  paper  presented  on  27 
December  1931  at  a  meeting  of  the  Institute  of 
Mathematical  Statistics. 


c, - .  "a  Comparative  Study 

of  Three  Models  for  Simple  Learning,"  Memorandum  VI, 
21  December  1950  (Unpublished). 

MacKay,  D.  .  "Mindlike  Behavior  In  Artefacts,"  The  British 
'ournal  for  the  Philosophy  of  Science  (2 ,  6)  ^  ^ 

August", "  1931 ,  pp.  Iu5— 12l".  ^ 

iiorlru',  E.  Cf .  "Mind  and  Mechanism,"  Amer.  J.  Psychology 
(59,  e)  April,  19^6.  - 

Skinner,  D.  F.  The  Behavior  of  Organisms.  Appleton— Century- 

CroftaTT^J^H: -  - - 


a . 


ilajo^rood,  0-  G.,  Jr.  Military  Doctrine  of  Decision  and 

the  von  Neumann  Theory  of  Games.  RM— 52d,  The  pAND 
Corporation,  20  ^-'l.arch  1950, 


b . 


- .  Military  DeclSiOn  and  the  Mathe— 

matlcal /iTifeory  of  Ofmes^^  Air 


Rayf»  (Ht  l)  S 


^  $ 


les .  j 

19*9/ 


PPf 


.  V' 


ry  Ap^lieitiona  of '  8%; 


